141 research outputs found

    EliXR-TIME: A Temporal Knowledge Representation for Clinical Research Eligibility Criteria.

    Get PDF
    Effective clinical text processing requires accurate extraction and representation of temporal expressions. Multiple temporal information extraction models were developed but a similar need for extracting temporal expressions in eligibility criteria (e.g., for eligibility determination) remains. We identified the temporal knowledge representation requirements of eligibility criteria by reviewing 100 temporal criteria. We developed EliXR-TIME, a frame-based representation designed to support semantic annotation for temporal expressions in eligibility criteria by reusing applicable classes from well-known clinical temporal knowledge representations. We used EliXR-TIME to analyze a training set of 50 new temporal eligibility criteria. We evaluated EliXR-TIME using an additional random sample of 20 eligibility criteria with temporal expressions that have no overlap with the training data, yielding 92.7% (76 / 82) inter-coder agreement on sentence chunking and 72% (72 / 100) agreement on semantic annotation. We conclude that this knowledge representation can facilitate semantic annotation of the temporal expressions in eligibility criteria

    Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

    Full text link
    We hypothesize that large language models (LLMs) based on the transformer architecture can enable automated detection of clinical phenotype terms, including terms not documented in the HPO. In this study, we developed two types of models: PhenoBCBERT, a BERT-based model, utilizing Bio+Clinical BERT as its pre-trained model, and PhenoGPT, a GPT-based model that can be initialized from diverse GPT models, including open-source versions such as GPT-J, Falcon, and LLaMA, as well as closed-source versions such as GPT-3 and GPT-3.5. We compared our methods with PhenoTagger, a recently developed HPO recognition tool that combines rule-based and deep learning methods. We found that our methods can extract more phenotype concepts, including novel ones not characterized by HPO. We also performed case studies on biomedical literature to illustrate how new phenotype information can be recognized and extracted. We compared current BERT-based versus GPT-based models for phenotype tagging, in multiple aspects including model architecture, memory usage, speed, accuracy, and privacy protection. We also discussed the addition of a negation step and an HPO normalization layer to the transformer models for improved HPO term tagging. In conclusion, PhenoBCBERT and PhenoGPT enable the automated discovery of phenotype terms from clinical notes and biomedical literature, facilitating automated downstream tasks to derive new biological insights on human diseases

    Assessing the readiness of precision medicine interoperabilty: An exploratory study of the National Institutes of Health genetic testing registry

    Get PDF
    Background:  Precision medicine involves three major innovations currently taking place in healthcare:  electronic health records, genomics, and big data.  A major challenge for healthcare providers, however, is understanding the readiness for practical application of initiatives like precision medicine.Objective:  To better understand the current state and challenges of precision medicine interoperability using a national genetic testing registry as a starting point, placed in the context of established interoperability formats.Methods:  We performed an exploratory analysis of the National Institutes of Health Genetic Testing Registry.  Relevant standards included Health Level Seven International Version 3 Implementation Guide for Family History, the Human Genome Organization Gene Nomenclature Committee (HGNC) database, and Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT).  We analyzed the distribution of genetic testing laboratories, genetic test characteristics, and standardized genome/clinical code mappings, stratified by laboratory setting.Results: There were a total of 25472 genetic tests from 240 laboratories testing for approximately 3632 distinct genes.  Most tests focused on diagnosis, mutation confirmation, and/or risk assessment of germline mutations that could be passed to offspring.  Genes were successfully mapped to all HGNC identifiers, but less than half of tests mapped to SNOMED CT codes, highlighting significant gaps when linking genetic tests to standardized clinical codes that explain the medical motivations behind test ordering.  Conclusion:  While precision medicine could potentially transform healthcare, successful practical and clinical application will first require the comprehensive and responsible adoption of interoperable standards, terminologies, and formats across all aspects of the precision medicine pipeline

    Large Language Models for Granularized Barrett's Esophagus Diagnosis Classification

    Full text link
    Diagnostic codes for Barrett's esophagus (BE), a precursor to esophageal cancer, lack granularity and precision for many research or clinical use cases. Laborious manual chart review is required to extract key diagnostic phenotypes from BE pathology reports. We developed a generalizable transformer-based method to automate data extraction. Using pathology reports from Columbia University Irving Medical Center with gastroenterologist-annotated targets, we performed binary dysplasia classification as well as granularized multi-class BE-related diagnosis classification. We utilized two clinically pre-trained large language models, with best model performance comparable to a highly tailored rule-based system developed using the same data. Binary dysplasia extraction achieves 0.964 F1-score, while the multi-class model achieves 0.911 F1-score. Our method is generalizable and faster to implement as compared to a tailored rule-based approach
    • …
    corecore